Skip to content

✨ Update logic for HostClaims#3152

Open
pierrecregut wants to merge 2 commits intometal3-io:mainfrom
pierrecregut:feature/hostclaim-update
Open

✨ Update logic for HostClaims#3152
pierrecregut wants to merge 2 commits intometal3-io:mainfrom
pierrecregut:feature/hostclaim-update

Conversation

@pierrecregut
Copy link
Copy Markdown
Contributor

What this PR does / why we need it:

Update ensures the synchronization between HostClaim and BareMetalHost.

  • Copy HostClaim secrets to BareMetalHost namespace
  • Ensure propagation of reboot annotations
  • Synchronize hostclaim status (mirrors BareMetalHost conditions)

Checklist:

  • Documentation has been updated, if necessary.
  • Unit tests have been added, if necessary.
  • E2E tests have been added, if necessary.
  • Integration tests have been added, if necessary.

@metal3-io-bot
Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign rozzii for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@metal3-io-bot metal3-io-bot added the size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. label Apr 10, 2026
Comment thread pkg/hostclaim/hostclaim_manager.go Outdated
return nil
}

// Update updates a machine and is invoked by the Machine Controller.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Outdated docstrings

Comment thread pkg/hostclaim/hostclaim_manager.go Outdated

// Update updates a machine and is invoked by the Machine Controller.
func (m *HostManager) Update(ctx context.Context) error {
m.Log.Info("Updating machine")
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please find and replace all cases of "machine" here

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And this should be V(1) too unless we detect that actual changes are needed

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

return hideConflictError(err)
}

// transient rebootAnnotation was successfully transmitted. We can delete it on HostClaim.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bit unfortunate: this way it's hard to track when reboot is actually executed. But I don't see a good way out.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes and if the HostClaim patch fails we may even reboot twice in rare cases.

m.SetConditionHostToFalse(
metal3api.AssociatedCondition, metal3api.MissingBareMetalHostReason,
"BareMetalHost associated to the claim not found")
return errors.New("BareMetalHost not found")
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I"m curious what the expected remediation is. Is a user supposed to delete and re-create the claim? Or is the controller supposed to provide another BMH?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the consumerRef case status.BaremetalHost is reset to nil, so the association logic is performed again (because in the controller introduced later the defer patch will be done as in m3m controller). This was not done here and I have added it.

Comment thread pkg/hostclaim/hostclaim_manager.go Outdated
}
err := m.client.Get(ctx, key, &bmh)
if k8serrors.IsNotFound(err) {
m.Log.Info("Annotated host not found", "bmh", bmhRef.Name, "bmhNamespace", bmhRef.Namespace)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/annotated/linked/ ?

if !consumerRefMatches(bmh.Spec.ConsumerRef, hostClaim) {
m.Log.Info("The consumer ref does not point to the hostClaim", "consumerRef", bmh.Spec.ConsumerRef)
hostClaim.Status.BareMetalHost = nil
return nil, ErrNoBMH
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar question here: what's the expected action in this case? Especially since we only provide a generic error message (which is understandable)?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see above.

sourceRef *corev1.SecretReference,
namespace string,
hostName string,
) (*corev1.SecretReference, error) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This call should exit early if the target namespace is the same as the source namespace (a non-multi-tenant case).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

conditions.SetMirrorCondition(bmh, m.HostClaim, metal3api.ProvisionedCondition)
m.SetConditionHostToTrue(metal3api.AssociatedCondition, metal3api.BareMetalHostAssociatedReason)

if !equality.Semantic.DeepEqual(m.HostClaim.Status, hostOld) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And this is where logging something would actually be useful

bmh.Annotations = map[string]string{}
}

if syncReboot(m.HostClaim.Annotations, bmh.Annotations) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should also sync the detachment annotation (also maybe without the "force" flag being introduced in #2955 - i.e. sanitize the value). Detaching is commonly used in my world (e.g. by ACM) to prevent Ironic from further touching the deployed instance.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dtantsur I have not modified this part yet. I want to be sure that we want to give control of the Detach annotation to the customer.... An analogy of hostclaim is taking a taxi rather than renting/owning a car: you do not have the keys. With detach, you ask the taxi driver to throw away the keys.... but you still do not have them. As it is done, I do not see a way of sharing the right to detach by both the customer and the infra owner (no consumerOverride).

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could have a knob to enable/disable it per HDP. But it's quite important for users to be able isolate their instance from Ironic's power management. It also allows reducing the load on Ironic.

I guess we'd need to re-attach on HostClaim deletion.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added a way to authorize Detach in the HostDeployPolicy. (At least one matching policy must have the allowsDetachedMode flag set). I have also added a way to let admin detach the BMH independently of the HostClaim with a manager flag.
I still do not understand how it can really be used in practice: how is the user managing the power setting without access to the BMC ? He can reboot or halt through the OS, but the first should not interfere with Ironic and if he uses the second, he will need Ironic to power on again.
There is some kind of sanitization and the way it is done, it should "transparently" leaves off force flag when it is merged.

I would keep it as a separate revertible commit for all the reasons above.

Comment thread pkg/hostclaim/hostclaim_manager.go Outdated
}

if err != nil {
m.SetConditionHostToFalse(
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe don't update the condition on conflict?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

Update ensures the synchronization between HostClaim and BareMetalHost.
* Copy HostClaim secrets to BareMetalHost namespace
* Ensure propagation of reboot annotations
* Synchronize hostclaim status (mirrors BareMetalHost conditions)

Co-authored-by: Pierre Crégut <pierre.cregut@orange.com>
Co-authored-by: Laurent Roussarie <laurent.roussarie@orange.com>
Signed-off-by: Pierre Crégut <pierre.cregut@orange.com>
@pierrecregut pierrecregut force-pushed the feature/hostclaim-update branch from af309eb to 8a95ab0 Compare April 10, 2026 16:06
@pierrecregut pierrecregut force-pushed the feature/hostclaim-update branch from cbdae05 to a268557 Compare April 28, 2026 09:19
When a HostClaim has a detached annotation, we propagate it to the
BareMetalHost iff there is a HostDeployPolicy compatible with the claim
that allows such a propagation.

We add a manager flag in the "arguments" of the annotation. HostClaim
do not set it to true and if the annotation does not appear in the
HostClaim, it is not removed from the BareMetalHost if the manager
field is set to true.

This way the administrator can still force detachment.

Signed-off-by: Pierre Crégut <pierre.cregut@orange.com>
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates HostClaim/BareMetalHost synchronization logic, including secret propagation across namespaces, reboot annotation propagation, detached-mode handling gated by HostDeployPolicy, and mirroring BareMetalHost conditions into HostClaim status.

Changes:

  • Add HostManager Update() flow to synchronize BMH spec (secrets/image/custom deploy/online), reboot annotations, detached annotation, and HostClaim status.
  • Introduce allowsDetachedMode in HostDeployPolicy API + CRDs to gate HostClaim-driven detached mode.
  • Expand HostClaim manager unit tests to cover spec syncing, reboot propagation, detached sync, and status mirroring.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 9 comments.

Show a summary per file
File Description
pkg/hostclaim/hostclaim_manager.go Implements HostClaim↔BMH synchronization (secrets, reboot/detach annotations, status mirroring) and refactors policy evaluation.
pkg/hostclaim/hostclaim_manager_test.go Adds test coverage for new sync behaviors (setBmhSpec, reboot propagation, detached sync, status updates, Update()).
internal/testutil/common.go Fixes builder typo (SetNetworkData) and adds HostDeployPolicy builder helper for detached mode.
apis/metal3.io/v1alpha1/hostdeploypolicy_types.go Adds AllowsDetachedMode to HostDeployPolicySpec.
config/base/crds/bases/metal3.io_hostdeploypolicies.yaml CRD schema update for allowsDetachedMode.
config/render/capm3.yaml Rendered CRD/schema update for allowsDetachedMode.
apis/metal3.io/v1alpha1/baremetalhost_types.go Extends detached-annotation args with manager flag to prevent HostClaim from deleting admin-set annotations.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

return false, err
}
inputArgs := metal3api.DetachedAnnotationArguments{}
_ = json.Unmarshal([]byte(annotValue), &inputArgs)
Copy link

Copilot AI Apr 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

syncDetached() ignores JSON parse errors for the HostClaim detached annotation (_ = json.Unmarshal(...)). If the value is malformed, this will silently treat it as empty args and still write an annotation to the BMH. Handle the unmarshal error and return it (or at least skip syncing) to avoid unexpected behavior from invalid user input.

Suggested change
_ = json.Unmarshal([]byte(annotValue), &inputArgs)
if err := json.Unmarshal([]byte(annotValue), &inputArgs); err != nil {
return false, err
}

Copilot uses AI. Check for mistakes.
// a HostClaim.
PausedAnnotationValue = "metal3.io/hostclaim"
// rebootDomain is the domain of metal3 reboot annotations.
rebootDomain = "reboot.metal3.io"
Copy link

Copilot AI Apr 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rebootDomain duplicates the existing API constant metal3api.RebootAnnotationPrefix (also "reboot.metal3.io"). Using the shared constant avoids drift and keeps validation/behavior aligned across the codebase.

Suggested change
rebootDomain = "reboot.metal3.io"
rebootDomain = metal3api.RebootAnnotationPrefix

Copilot uses AI. Check for mistakes.
log := m.Log.WithValues("policyNamespace", hostDeployPolicy.Namespace, "policyName", hostDeployPolicy.Name)
constraints := hostDeployPolicy.Spec.HostClaimNamespaces
if constraints == nil {
m.Log.V(1).Info("Ignoring HostDeployPolicy without constraint")
Copy link

Copilot AI Apr 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In checkPolicy(), the "Ignoring HostDeployPolicy without constraint" log uses m.Log instead of the local log that includes policy name/namespace fields, so the contextual fields are lost.

Suggested change
m.Log.V(1).Info("Ignoring HostDeployPolicy without constraint")
log.V(1).Info("Ignoring HostDeployPolicy without constraint")

Copilot uses AI. Check for mistakes.
Comment on lines +494 to +500
Entry("set meta-data/network-data (overide)", testCaseSetBMHSpec{
MetaData: NewSecret("s1", HostclaimNamespace).SetData(map[string][]byte{"f": []byte("mdt")}).Build(),
NetworkData: NewSecret("s2", HostclaimNamespace).SetData(map[string][]byte{"f": []byte("nwdt")}).Build(),
BMHMetaData: NewSecret("bmh-metadata", "ns").SetData(map[string][]byte{"f": []byte("other")}).Build(),
BMHNetworkData: NewSecret("bmh-networkdata", "ns").SetData(map[string][]byte{"f": []byte("other")}).Build(),
}),
Entry("reset meta-data/network-data (overide)", testCaseSetBMHSpec{
Copy link

Copilot AI Apr 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spelling: entry title uses "overide"; should be "override" for consistency/readability in test output.

Suggested change
Entry("set meta-data/network-data (overide)", testCaseSetBMHSpec{
MetaData: NewSecret("s1", HostclaimNamespace).SetData(map[string][]byte{"f": []byte("mdt")}).Build(),
NetworkData: NewSecret("s2", HostclaimNamespace).SetData(map[string][]byte{"f": []byte("nwdt")}).Build(),
BMHMetaData: NewSecret("bmh-metadata", "ns").SetData(map[string][]byte{"f": []byte("other")}).Build(),
BMHNetworkData: NewSecret("bmh-networkdata", "ns").SetData(map[string][]byte{"f": []byte("other")}).Build(),
}),
Entry("reset meta-data/network-data (overide)", testCaseSetBMHSpec{
Entry("set meta-data/network-data (override)", testCaseSetBMHSpec{
MetaData: NewSecret("s1", HostclaimNamespace).SetData(map[string][]byte{"f": []byte("mdt")}).Build(),
NetworkData: NewSecret("s2", HostclaimNamespace).SetData(map[string][]byte{"f": []byte("nwdt")}).Build(),
BMHMetaData: NewSecret("bmh-metadata", "ns").SetData(map[string][]byte{"f": []byte("other")}).Build(),
BMHNetworkData: NewSecret("bmh-networkdata", "ns").SetData(map[string][]byte{"f": []byte("other")}).Build(),
}),
Entry("reset meta-data/network-data (override)", testCaseSetBMHSpec{

Copilot uses AI. Check for mistakes.
}
bmh := bmhBuilder.Build()
objects = append(objects, hostClaim, bmh)
// Add secrets if they exists
Copy link

Copilot AI Apr 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Grammar in comment: "Add secrets if they exists" should be "Add secrets if they exist".

Suggested change
// Add secrets if they exists
// Add secrets if they exist

Copilot uses AI. Check for mistakes.
namespace string,
hostName string,
) (*corev1.SecretReference, error) {
if namespace == bmh.Namespace {
Copy link

Copilot AI Apr 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

synchronizeDataSecret() will panic when sourceRef is nil and the HostClaim namespace matches the BMH namespace: sourceRef.DeepCopy() dereferences a nil pointer. Return nil, nil (or just nil) when sourceRef == nil in this branch.

Suggested change
if namespace == bmh.Namespace {
if namespace == bmh.Namespace {
if sourceRef == nil {
return nil, nil
}

Copilot uses AI. Check for mistakes.
Comment on lines +824 to +828
for _, hdp := range policiesInNs.Items {
if accept, err := m.checkPolicy(&hdp, hostNs, nsLabels); err != nil {
return nil, err
} else if accept {
compatiblePolicies = append(compatiblePolicies, &hdp)
Copy link

Copilot AI Apr 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GetCompatiblePolicies() appends &hdp where hdp is the range variable; all returned pointers will reference the same item (the last policy). Iterate by index (e.g., for i := range policiesInNs.Items) or copy into a new variable before taking the address.

Suggested change
for _, hdp := range policiesInNs.Items {
if accept, err := m.checkPolicy(&hdp, hostNs, nsLabels); err != nil {
return nil, err
} else if accept {
compatiblePolicies = append(compatiblePolicies, &hdp)
for i := range policiesInNs.Items {
hdp := &policiesInNs.Items[i]
if accept, err := m.checkPolicy(hdp, hostNs, nsLabels); err != nil {
return nil, err
} else if accept {
compatiblePolicies = append(compatiblePolicies, hdp)

Copilot uses AI. Check for mistakes.
Comment on lines +398 to +401
if k8serrors.IsNotFound(err) {
log.Error(err, "Missing source Secret for synchronization from claim to BMH", "source", sourceKey)
return nil, RequeueAfterError{RequeueAfter: HostClaimRequeueDelay}
}
Copy link

Copilot AI Apr 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When the source Secret is not found, this returns a bare RequeueAfterError, whose message becomes just "requeue in 30s". Consider wrapping it with context about which secret is missing (e.g., include sourceKey in the error) so the HostClaim condition/message is actionable.

Copilot uses AI. Check for mistakes.
Comment on lines +494 to +500
Entry("set meta-data/network-data (overide)", testCaseSetBMHSpec{
MetaData: NewSecret("s1", HostclaimNamespace).SetData(map[string][]byte{"f": []byte("mdt")}).Build(),
NetworkData: NewSecret("s2", HostclaimNamespace).SetData(map[string][]byte{"f": []byte("nwdt")}).Build(),
BMHMetaData: NewSecret("bmh-metadata", "ns").SetData(map[string][]byte{"f": []byte("other")}).Build(),
BMHNetworkData: NewSecret("bmh-networkdata", "ns").SetData(map[string][]byte{"f": []byte("other")}).Build(),
}),
Entry("reset meta-data/network-data (overide)", testCaseSetBMHSpec{
Copy link

Copilot AI Apr 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spelling: entry title uses "overide"; should be "override" for consistency/readability in test output.

Suggested change
Entry("set meta-data/network-data (overide)", testCaseSetBMHSpec{
MetaData: NewSecret("s1", HostclaimNamespace).SetData(map[string][]byte{"f": []byte("mdt")}).Build(),
NetworkData: NewSecret("s2", HostclaimNamespace).SetData(map[string][]byte{"f": []byte("nwdt")}).Build(),
BMHMetaData: NewSecret("bmh-metadata", "ns").SetData(map[string][]byte{"f": []byte("other")}).Build(),
BMHNetworkData: NewSecret("bmh-networkdata", "ns").SetData(map[string][]byte{"f": []byte("other")}).Build(),
}),
Entry("reset meta-data/network-data (overide)", testCaseSetBMHSpec{
Entry("set meta-data/network-data (override)", testCaseSetBMHSpec{
MetaData: NewSecret("s1", HostclaimNamespace).SetData(map[string][]byte{"f": []byte("mdt")}).Build(),
NetworkData: NewSecret("s2", HostclaimNamespace).SetData(map[string][]byte{"f": []byte("nwdt")}).Build(),
BMHMetaData: NewSecret("bmh-metadata", "ns").SetData(map[string][]byte{"f": []byte("other")}).Build(),
BMHNetworkData: NewSecret("bmh-networkdata", "ns").SetData(map[string][]byte{"f": []byte("other")}).Build(),
}),
Entry("reset meta-data/network-data (override)", testCaseSetBMHSpec{

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants